Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames
نویسندگان
چکیده
This paper proposes a wide-range anaphora resolution system toward text understanding. This system resolves zero, direct and indirect anaphors in Japanese texts by integrating two sorts of linguistic resources: a hand-annotated corpus with various relations and automatically constructed case frames. The corpus has relevance tags which consist of predicate-argument relations, relations between nouns and coreferences, and is utilized for learning parameters of the system and testing it. The case frames are indispensable knowledge both for detecting zero/indirect anaphors and estimating appropriate antecedents. Our preliminary experiments showed promising results.
منابع مشابه
A Domain Ontology Production Tool Kit Based on Automatically Constructed Case Frames
This paper proposes a tool kit to produce a domain ontology for text mining, based on case frames automatically constructed from a raw corpus of a specific domain. Since case frames are strongly related to implicit facts hidden in large domain-specific corpora, we can say that case frames are a promising device for text mining. The aim of the tool kit is to enable automatic analysis of event re...
متن کاملThe Automatic Acquisition Of Frequencies Of Verb Subcategorization Frames From Tagged Corpora
We describe a mechanism for automatically acquiring verb subcategorization frames and their frequencies in a large corpus. A tagged corpus is first partially parsed to identify noun phrases and then a finear grammar is used to estimate the appropriate subcategorization frame for each verb token in the corpus. In an experiment involving the identification of six fixed subcategorization frames, o...
متن کاملTowards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کاملAutomatic Construction of Nominal Case Frames and its Application to Indirect Anaphora Resolution
This paper proposes a method to automatically construct Japanese nominal case frames. The point of our method is the integrated use of a dictionary and example phrases from large corpora. To examine the practical usefulness of the constructed nominal case frames, we also built a system of indirect anaphora resolution based on the case frames. The constructed case frames were evaluated by hand, ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004